Expert-Informed Topic Models for Document Set Discovery
نویسندگان
چکیده
The first step in many text-as-data studies is to find documents that address a specific topic within larger document set. Researchers often rely on simple keyword searches do this, even though this may introduce considerable selection bias. Such bias be greater when researchers lack the domain knowledge required make informed search decisions, for example, cross-national research or unfamiliar social contexts. We propose expert-informed modeling (EITM) as hybrid approach tackle problem. EITM combines validity of external captured through expert surveys with probabilistic models help identify subsets cover initially unknown domain-specific topics, such events and debates, belong researcher-defined master topic. flexible efficient thematic from large text corpora further study. benchmark validate method by discovering blog posts public role religion Australian, Swiss, Turkish provide complete workflow guide application their own work.
منابع مشابه
Unsupervised Document Classification with Informed Topic Models
Document classification is an important and common application in natural language processing. Scaling classification approaches to many targets faces a bottleneck in acquiring gold standard labels. In this work, we develop and evaluate a method for using informed topic models to noisily label documents, creating a noisy but usable set of labels for training discriminative classifiers. We inves...
متن کاملRelational Topic Models for Document Networks
We develop the relational topic model (RTM), a model of documents and the links between them. For each pair of documents, the RTM models their link as a binary random variable that is conditioned on their contents. The model can be used to summarize a network of documents, predict links between them, and predict words within them. We derive efficient inference and learning algorithms based on v...
متن کاملTopic Models for Semantically Annotated Document Collections
Increasingly, web document collections such as PubMed and DBPedia, but also social bookmarking systems, are annotated with semantic meta data. Given that the number of semantically annotated document collections is expected to increase in the near future, it is of interest to analyze if topic models might be able to play a larger role. Since most of the time, annotations are noisy and even huma...
متن کاملSparse Relational Topic Models for Document Networks
Learning latent representations is playing a pivotal role in machine learning and many application areas. Previous work on the relational topic model (RTM) has shown promise on learning latent topical representations for describing relational document networks and predicting pairwise links. However under a probabilistic formulation with normalization constraints, RTM could be ineffective in con...
متن کاملQuery-Document Relevance Topic Models
In this paper, we aim to deal with the de ciency of current information retrieval models by integrating the concept of relevance into the generation model from di erent topical aspects of the query. We study a series of relevance-dependent topic models. These models are adapted from the latent Dirichlet allocation model. They are distinguished by how the notation of query-document relevance, wh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Communication Methods and Measures
سال: 2021
ISSN: ['1931-2458', '1931-2466']
DOI: https://doi.org/10.1080/19312458.2021.1920008